1. Project Objective

The primary objective of this project centred around developing a DIY multi-camera CCTV system built atop of raspberry-pi hardware whilst integrating computer vision algorithms and capabilities. More specifically, IOT Sauron aims to provide cameras with the ability to track and reposition actively adapting to the location of users in frame. In addition to this, this project focused on making the camera streams web-broadcastable to enable consumers to monitor camera feed when at work/ other asynchronous locations. Altogether, this seeks to provide a working prototype security system which is capable of being assembled and utilized within a home environment.

2. Introduction

EyeOT/IOT Sauron features a multi-camera streaming system as developed using a raspberry pi operating system. This system features a series of 4 servo-motors to control the X-Y direction plane with a series of 2 onboard cameras which track the motion of humans in frame. Meanwhile, the data is streamed through HTTP servers on the raspberry-pi enabling for all-users available on the given wifi-network to access the streaming information.

Figure 1: Video of the Embedded OS demonstration.

3. Design and Testing

The design for this project involves a comprehensive system architecture which includes several components that exchange data and information with each other. Each Raspberry Pi board is responsible for reading a camera feed and controlling its servo motors. The main Raspberry Pi board (dedicated as a Raspberry-pi 3), functions as both a server and a client. It actively receives streams from the other Raspberry Pi boards through a TCP protocol while also posting its own stream-related information via HTTP. Figure 2 shows the system architecture diagram as used for this project.

Figure 2: Overall System Architecture Diagram for Project.

3.1 Computer Vision

Computer vision played a fundamental role within this project, enabling effective object detection and tracking. Here, opencv libraries was utilized to provide a lightweight solution for running vision algorithms on a Raspberry-pi operating system. Specifically, this project features 2 modes of operation for computer vision providing the capability of performing face or blue detection.

3.1.1 Detection of Faces

Detecting faces using OpenCV involves a multi-stage process. Firstly, to ensure accurate detection, each camera frame is converted to a grayscale image. Following this, a Haar cascade model is applied to detect faces in frame. These cascades contain a collection of Haar-like features which represent patterns of intensity changes in rectangular areas [1]. For instance, regions such as the lips comprise of an upper lip having a lower intensity compared to the lower lip. Similarly, the forehead might have a higher intensity than the hair on the head as shown by figure 3. By combining these features, we are able to detect faces, as illustrated in figure 4.

Figure 3: Haar Features as Identified on a Face [2].

Figure 4: Face Detection Through Haar-like features through OpenCV.

3.1.2 Colour Detection

In addition to face detection, this system is also capable of detecting coloured objects. This was necessary as the aforementioned face detection is not robust to orientation change as compared to coloured object detection. HSV values are particularly useful for vision-based tasks, aiding in colour detection and region bounding. Our approach for colour detection involved filtering the image based on a specific blue hue range (which was experimentally determined to be between 98-139), as shown in figure 5. When applying blue detection, multiple centroids and clusters appear within the HSV image. Therefore, it was necessary to select the largest cluster within the image frame to provide a target for the servo motors to track and respond to. Figure 6 shows the relevant centroid corresponding to the bin object, positioned in the center of the frame.

Figure 5: Hue range for colours with blue corresponding to values between 98-150.

Figure 6: Blue Colours with Relevant Centroids Drawn and Selected.

3.2 Servo Motors

To control the motion and position of cameras, a system comprising four servos and 2 gimbles was constructed. For each camera gimble, a series of two motors are required to actively control the movement of the camera. Specifically, one servo is responsible for adjusting the camera's position in the horizontal (x-direction) axis, while the other servo handles adjustments in the vertical (y-direction) axis. Figure 6 below shows a picture of the camera gimble and servo motor setup which were used in this project.

Figure 6: Servo Motor and Gimble Architecture for Controlling X-Y Values.

3.2.1 Underlying Servo Motor Mechanisms

The primary aim of the servo motors involves centering the detected object in the middle of the frame (refer to figure 7). The underlying system utilizes a series of PD controllers of which provides advantages related to responsiveness and stable control which aids in ensuring smooth motion and control. The proportional component ensures that the system responds proportionally to the error between the detected object's position and the center of the frame, while the derivative component helps dampen oscillations and improve the system's stability by anticipating future trends in the error signal.

In order to supply Pulse Width Modulation (PWM) signals to the servos, we leverage the GPIO (General Purpose Input Output) pins of the Raspberry Pi, coupled with the PIGPIO library. This combination enables us to generate hardware-timed PWM signals, ensuring smoother and less jittery motion of the servos. By utilizing hardware-timed PWMs, we achieve more precise control over the servo motors, which is crucial for accurate positioning and movement within our system.

Figure 7: Servo Motor Attempts to Move Centroid to Target Location.

3.2.2 Communication and synchronization between camera systems

In our approach to facilitate communication and synchronization between camera systems, we opted for a straightforward method rather than incorporating complex communication protocols. We chose to employ GPIO (General Purpose Input/Output) read and write operations for integration, keeping the process simple and streamlined.

Given that our project involved the use of two camera systems, each connected to a Raspberry Pi, we devised a strategy where each Raspberry Pi would set a designated GPIO pin high whenever it detected an object. These pins were then connected between the two Raspberry Pis, allowing them to monitor each other's status continuously. Subsequently, we developed a state machine to govern decision-making processes. For instance, if one camera detected an object while the other did not, the Raspberry Pi of the inactive camera would recognize this and adjust the camera's orientation to align with the line of sight of the detecting camera. Conversely, when neither camera detected any objects, they were programmed to look in opposite directions, maximizing coverage of the field of view while avoiding redundancy.

Figure 8: Wiring and Circuit Diagram for 4 Servo-Motor Circuit.

3.3 Streaming

IOT Sauron features a relatively comprehensive streaming process with the system architecture for this model previously depicted in figure 2. To facilitate streaming on the raspberry-pi subsystem, we need to utilize a series of 3 threads to handle information from the camera on the main raspberry-pi as well as from the supplementary raspberry-pi. This is managed through having each camera thread utilizing a buffer where information is written to while having a main server thread read from this buffer in a thread safe manner. Figure 9 below provides an illustration of the threads present and how they are used.

Figure 9: Thread Description for Each Thread when Streaming.

After the threads post information through HTTP, it will be possible for clients on external IP addresses to access the relevant HTML which was rendered. This relied on the use of the pi-camera official streaming documentation owing to its increased reliability and usability [3]. This method is chosen due to the need for utilizing a 32-bit operating system as utilized in lab as other solutions such as streamlit were incompatible with the 32-bit O.S. Figure 10 shows provides a screenshot of the website when running featuring 2 camera streams on system.

Figure 10: Website Featuring Multiple Camera Streams and HTML Information as Accessed from an External Laptop.

3.4 Testing Process

This project involved breaking down the system into multiple subsystems, each with its own functional requirements. We employ an integration system testing approach, starting with individual subsystems and building up to a comprehensive system test. This method facilitates fault localization at various levels.

Figure 11: Testing Process and Functional Requirements for Each System of the Project.

  • Modular Testing: This testing is conducted on the Raspberry Pi board, utilizing a white-box approach, which involves a thorough understanding of the underlying code structure and logic flow [4]. The testing process for each subsystem is as follows.
    1. Camera Testing: This involved running a python script to determine whether the OpenCV libraries can be adequately imported and whether faces/ colours are adequately detected.
    2. Servo Testing:After running a python script, we would check the hardware with oscilloscopes to determine whether PWM signals were sent and LED lights could flash signifying GPIO triggers were present.
    3. Web-server testing: Utilize an external laptop to receive stream based information, determine whether the IP address is reachable through web-browsers or through ping testing. In this phase, the external laptop can also provide Mock camera stream information to be streamed over the web.
  • Inter Module Testing: Following the completion of a modular testing phase, a series of inter-module tests was completed as depicted by second row of figure 11.
  • Integration Testing: After inter module testing was conducted, the system was tested observing performance related metrics. This phase involved having multiple clients logging onto the website to receive stream information.
  • 4. Results

    During the final stage, we decided to measure the latency and CPU utilization for tasks in our embedded system. Tasks were divided into several categories including face detection, colour detection, as well as tcp-server send time. From figure 11, it can be seen that face detection tasks takes approximately 0.755 seconds which is longer than the time taken for colour detection. When posting information to the website, time taken will be dependent on both TCP server time and either colour or face detection. Based on the timing results, colour detection was selected as the optimal approach to make a relatively real-time system. Meanwhile, we ran a htop command to determine the CPU utilization of our CPU cores when running these 3 threads.

    Figure 11: Thread Description for Each Thread when Streaming.

    Figure 12: CPU Utilization of Each Core using HTOP Command.

    From figure 12, it can be seen that the CPU utilization for each core is relatively high with one of the cores having an approximately 70.3% utilization. This shows the relatively high processing power and memory needed for conducting computer vision tasks and running threads. When an additional camera was simulated (through using a laptop as another stream) CPU utilization increases even further often having cores exceeding 90% utilizatioin, suggesting how the architecture will be unable to handle and synchronize many camera streams.

    5. Issues and Diagnostics

    Over the course of project, there were several issues that we faced along the way which we needed to adapt and respond to:

    1. Installation of OpenCV: Installing OpenCV proved to be rather difficult. Here, it was not possible to conduct regular pip install's of OpenCV on the Raspberry-pi. This required running the command "sudo apt-get install python3-opencv" which installs a pre-compiled version of OpenCV on the raspberry-pi.
    2. Video Streaming: Initially, the group tried to utilize WebRTC and Streamlit platforms to stream computer vision feed from the raspberry-pi. However, this faced several issues as streamlit generally required a 64-bit Operating system. To overcome this problem, we utilized the HTTP streaming as compatible with raspberry-pi cameras which offered a simplistic and lightweight means of conducting video stream.
    3. Servo-Motor Tuning: Achieving optimal and smooth servo motor motions presented the team with several challenges. It was difficult to tune the PD controller and ensure that the motors wouldn't either overshoot or undershoot. This was exacerbated by the fact that the proportional and derivative terms need to be change depending on the speed of the code with lower values for faster code and vice versa. Eventually, by a process of iterative testing we managed to obtain PD values for all relevant terms.

    6. Conclusion

    As part of this project, a comprehensive multi-camera streaming system was developed which features object detection, servo motor control, and client-side camera viewing. Despite some initial issues with detection and streaming latency, the results were impressive, particularly given the limitations of using Raspberry Pi 3 and 4 models with 2 GB of max RAM. Notably, streamed data is consistently viewable for all clients with relatively low-latency and there is marginal capacity to spare for CPU utilization. This primarily is due to the use of multithreading which minimizes system resource utilization and enhances concurrency. Altogether, the project was highly succesful accomplishing all delinated objectives as specified within the intial project plan.

    7. Evaluation and Future Works

    While the project was primarily succesful, there are definitely further aspects which can be optimized and improved upon. Firstly, approaches could be taken to drop the streaming latency of the system. This could be done through using a UDP architecture or performing image compressions and measuring the difference in time taken. Secondly, code could potentially be converted from python operation to a C++ based architecture which likely would potentially reduce the CPU utilization and latency. Finally, steps could be taken to conduct interviews with users and CCTV camera owners to determine features that they currently enjoy about their cameras/ believe could provide added value.

    8. Budget

    Overall, the project didn't require any external components to be purchased as the parts were readily available in lab. However, a breakdown of components and costs can be produced and is provided by this table. This shows that there is an approximate $121.95 needed to assemble this project from scratch.

    Figure 13: Budget Estimate for Overall Project.

    9. Work breakdown

    The project comprised four main components: Detection, live video streaming, Servo control, and inter-system communication. James primarily focused on managing the streaming and detection aspects, implementing multithreading techniques to enhance object detection within frames. Meanwhile, Rahul took charge of controlling and fine-tuning the PID of servo motors, wiring the entire system, and establishing communication between different camera systems. While Rahul and James led key components, Mohammad ensured the smooth operation of multiple systems, prioritizing integration and overall functionality. Together, we strived to ensure optimal performance given the available compute and resources.

    10. Acknowledgments

    We would like to thank Professor Skovira and all of the TAs for their constant support and guidance throughout this project. Their assistance was invaluable in helping us reach our objectives. In this project, we also utilized a series of student projects to provide us with ideas and inspiration for report formatting and code development [6,7].

    10.1 Bibliography

    1. E. Fatima , “What are Haar-like features?,” Educative. Accessed: May 15, 2024. [Online]. Available: https://www.educative.io/answers/what-are-haar-like-features
    2. D. Adakane, “What are Haar Features used in Face Detection ?,” Analytics Vidhya. Accessed: May 15, 2024. [Online]. Available: https://medium.com/analytics-vidhya/what-is-haar-features-used-in-face-detection-a7e531c8332b
    3. “Video Streaming Raspberry Pi Camera | Random Nerd Tutorials.” Accessed: May 15, 2024. [Online]. Available: https://randomnerdtutorials.com/video-streaming-with-raspberry-pi-camera/
    4. V. Basili and R. Selby Comparing the Effectiveness of Software Testing Strategies, IEEE Transactions on Software Engineering SE-13(12), 1278–1296 (1987).
    5. V. Fang and X. Liang, “GestureHome.” Accessed: May 15, 2024. [Online]. Available: https://tinyurl.com/yj5rs4bh
    6. J. Ong, M. Setiawan, C. Louie, H. J. Mang, H. Joshi, M. Paczai, and A. Yau, “Unity Mars Rover Assembly.” https://github.com/JamesOngICL/Unity-Mars-Rover, Jun 2022